-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support to read single parquet file hosted in AWS S3 #4972
Conversation
5e7f79b
to
b8a3bd7
Compare
- Added an async response transfer to save on copies - Moved HEAD request to constructor - Saved on one copy while reading - Using a CRT based client - Merged the two read channels - Updated caffeine version - Added a shared cache between channels - Removed delete old entries from cache logic - Made cache size configurable
Removed Caffeine
dfdbad0
to
dab479f
Compare
Also added support for Python
dab479f
to
0572372
Compare
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTools.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTools.java
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTools.java
Show resolved
Hide resolved
...table/src/main/java/io/deephaven/engine/table/impl/locations/local/FileTableLocationKey.java
Outdated
Show resolved
Hide resolved
extensions/parquet/base/src/main/java/io/deephaven/parquet/base/ColumnPageReaderImpl.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/S3ParquetInstructions.java
Outdated
Show resolved
Hide resolved
...parquet/table/src/main/java/io/deephaven/parquet/table/location/ParquetTableLocationKey.java
Show resolved
Hide resolved
.../table/src/main/java/io/deephaven/parquet/table/util/ByteBufferAsyncResponseTransformer.java
Outdated
Show resolved
Hide resolved
...sions/parquet/table/src/main/java/io/deephaven/parquet/table/util/S3SeekableByteChannel.java
Outdated
Show resolved
Hide resolved
...nsions/parquet/table/src/test/java/io/deephaven/parquet/table/ParquetTableReadWriteTest.java
Show resolved
Hide resolved
0572372
to
edbd29a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very close now.
extensions/s3/src/main/java/io/deephaven/extensions/s3/ByteBufferAsyncResponseTransformer.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/ByteBufferAsyncResponseTransformer.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/ByteBufferAsyncResponseTransformer.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3SeekableByteChannel.java
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3SeekableByteChannel.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3SeekableByteChannel.java
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3SeekableChannelProvider.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3SeekableChannelProvider.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/AwsCredentialsImpl.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/S3Instructions.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/AwsCredentials.java
Outdated
Show resolved
Hide resolved
from deephaven.column import Column | ||
from deephaven.dtypes import DType | ||
from deephaven.table import Table | ||
from deephaven.experimental import s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting case. We have experimental bleeding into our normal API. I may be ok with this, but @jmao-denver should weigh in. I may be ok with it because it is obvious it is experimental, so yanking the rug later could be ok.
Do you expect the API to change going forward? If not, it could be promoted out of experimental. If it needs baking time, leave it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does need more baking time, especially since we will be adding more capabilities to it very soon. Right now, we only support reading a single parquet file. Going forward, we want to support reading partitioned data, metadata files, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we allow experimental feature to bleed into the normal API, maybe a factory/helper function s3_instructions(...)
is more desirable than forcing the user to use experimenta.s3.S3Instructions
directly? At least, when we decide to graduate S3Instructions
from experimental, the user doesn't need to change the code, even though code change comes with the territory of using experimental features. But I am OK with it as is. @chipkent your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably lean to what is currently here because it makes very explicit that it is experimental. I would kind of rather force them to change the code because it also forces them to know it is experimental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor code style issue found and a possible interface change suggestion.
from deephaven.column import Column | ||
from deephaven.dtypes import DType | ||
from deephaven.table import Table | ||
from deephaven.experimental import s3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we allow experimental feature to bleed into the normal API, maybe a factory/helper function s3_instructions(...)
is more desirable than forcing the user to use experimenta.s3.S3Instructions
directly? At least, when we decide to graduate S3Instructions
from experimental, the user doesn't need to change the code, even though code change comes with the territory of using experimental features. But I am OK with it as is. @chipkent your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Labels indicate documentation is required. Issues for documentation have been opened: Community: https://github.com/deephaven/deephaven.io/issues/3658 |
Will add a separate PR if we need more changes around the python interface. |
Related to #4836
Notes for documentation: https://docs.google.com/document/d/1X_laDtgURkoOnb8eQyuT_gSFJhplYf18d9qgEBIY_A0/edit?usp=sharing